R Executive summary

This comprehensive analysis examines the temporal evolution of LEGO set releases, exploring various dimensions such as set count, themes popularity, color variety, and complexity. The analysis utilizes data spanning several decades, emphasizing trends and fluctuations in the LEGO product strategy.

The LEGO Group has significantly increased the number of set releases per year since 2000, with notable market responsiveness observed during the COVID-19 pandemic, as seen in the peaks of 2020 and 2021.

The analysis identifies a surge in product diversity, highlighted by the growing number of unique colors per set, indicating a strategic pivot towards more visually engaging and complex offerings.

Thematic trends reveal that certain series, such as DC Super Heroes and various Minifigures series, dominate release counts, reflecting both sustained popularity and strategic brand partnerships.

An examination of the most popular colors shows a dominance of traditional colors, with black and white leading, suggesting these colors’ foundational roles in set design.

The correlation between set size and color variety is positive, indicating that larger sets tend to offer a greater spectrum of colors, adding to their complexity and appeal.

Basic statistics

Sizes and basic statistics of data sets after removing null values:

Colors dataset:

Dataset size:

## [1] 263   4
##        id             name               rgb              is_trans        
##  Min.   :  -1.0   Length:263         Length:263         Length:263        
##  1st Qu.:  83.0   Class :character   Class :character   Class :character  
##  Median :1005.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 651.4                                                           
##  3rd Qu.:1070.5                                                           
##  Max.   :9999.0

Elements dataset:

Dataset size:

## [1] 60456     4
##    element_id        part_num            color_id        design_id     
##  Min.   :   9327   Length:60456       Min.   :  -1.0   Min.   :  1001  
##  1st Qu.:4565425   Class :character   1st Qu.:  10.0   1st Qu.: 18454  
##  Median :6111350   Mode  :character   Median :  28.0   Median : 41748  
##  Mean   :5517587                      Mean   : 120.4   Mean   : 45570  
##  3rd Qu.:6286413                      3rd Qu.:  85.0   3rd Qu.: 75474  
##  Max.   :6499141                      Max.   :9999.0   Max.   :107520

Inventories dataset:

Dataset size:

## [1] 37265     3
##        id            version         set_num         
##  Min.   :     1   Min.   : 1.000   Length:37265      
##  1st Qu.: 14424   1st Qu.: 1.000   Class :character  
##  Median : 54379   Median : 1.000   Mode  :character  
##  Mean   : 61104   Mean   : 1.091                     
##  3rd Qu.: 88842   3rd Qu.: 1.000                     
##  Max.   :194312   Max.   :16.000

Inventory minifigs dataset:

Dataset size:

## [1] 20858     3
##   inventory_id      fig_num             quantity      
##  Min.   :     3   Length:20858       Min.   :  1.000  
##  1st Qu.:  7869   Class :character   1st Qu.:  1.000  
##  Median : 15681   Mode  :character   Median :  1.000  
##  Mean   : 43010                      Mean   :  1.062  
##  3rd Qu.: 66834                      3rd Qu.:  1.000  
##  Max.   :194312                      Max.   :100.000

Inventory parts dataset:

Dataset size:

## [1] 1180987       6
##   inventory_id      part_num            color_id         quantity      
##  Min.   :     1   Length:1180987     Min.   :  -1.0   Min.   :   1.00  
##  1st Qu.:  9404   Class :character   1st Qu.:   4.0   1st Qu.:   1.00  
##  Median : 22838   Mode  :character   Median :  15.0   Median :   2.00  
##  Mean   : 50849                      Mean   : 131.8   Mean   :   3.37  
##  3rd Qu.: 87088                      3rd Qu.:  71.0   3rd Qu.:   4.00  
##  Max.   :194312                      Max.   :9999.0   Max.   :3064.00  
##    is_spare           img_url         
##  Length:1180987     Length:1180987    
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
## 

Inventory sets dataset:

Dataset size:

## [1] 4358    3
##   inventory_id      set_num             quantity     
##  Min.   :    35   Length:4358        Min.   : 1.000  
##  1st Qu.:  8076   Class :character   1st Qu.: 1.000  
##  Median : 16423   Mode  :character   Median : 1.000  
##  Mean   : 52519                      Mean   : 1.813  
##  3rd Qu.: 98685                      3rd Qu.: 1.000  
##  Max.   :191576                      Max.   :60.000

Minifigs dataset:

Dataset size:

## [1] 13764     4
##    fig_num              name             num_parts         img_url         
##  Length:13764       Length:13764       Min.   :  0.000   Length:13764      
##  Class :character   Class :character   1st Qu.:  4.000   Class :character  
##  Mode  :character   Mode  :character   Median :  4.000   Mode  :character  
##                                        Mean   :  5.296                     
##                                        3rd Qu.:  5.000                     
##                                        Max.   :156.000

Part categories dataset:

Dataset size:

## [1] 66  2
##        id            name          
##  Min.   : 1.00   Length:66         
##  1st Qu.:19.25   Class :character  
##  Median :35.50   Mode  :character  
##  Mean   :35.36                     
##  3rd Qu.:51.75                     
##  Max.   :68.00

Part relationships dataset:

Dataset size:

## [1] 29977     3
##    rel_type         child_part_num     parent_part_num   
##  Length:29977       Length:29977       Length:29977      
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character

Parts dataset:

Dataset size:

## [1] 52615     4
##    part_num             name            part_cat_id    part_material     
##  Length:52615       Length:52615       Min.   : 1.00   Length:52615      
##  Class :character   Class :character   1st Qu.:17.00   Class :character  
##  Mode  :character   Mode  :character   Median :41.00   Mode  :character  
##                                        Mean   :38.91                     
##                                        3rd Qu.:60.00                     
##                                        Max.   :68.00

Sets dataset:

Dataset size:

## [1] 21880     6
##    set_num              name                year         theme_id  
##  Length:21880       Length:21880       Min.   :1949   Min.   :  1  
##  Class :character   Class :character   1st Qu.:2001   1st Qu.:273  
##  Mode  :character   Mode  :character   Median :2012   Median :497  
##                                        Mean   :2008   Mean   :442  
##                                        3rd Qu.:2018   3rd Qu.:608  
##                                        Max.   :2024   Max.   :752  
##    num_parts         img_url         
##  Min.   :    0.0   Length:21880      
##  1st Qu.:    3.0   Class :character  
##  Median :   31.0   Mode  :character  
##  Mean   :  161.4                     
##  3rd Qu.:  139.0                     
##  Max.   :11695.0

Themes dataset:

Dataset size:

## [1] 323   3
##        id            name             parent_id    
##  Min.   :  3.0   Length:323         Min.   :  1.0  
##  1st Qu.:205.0   Class :character   1st Qu.:186.0  
##  Median :469.0   Mode  :character   Median :411.0  
##  Mean   :419.9                      Mean   :360.6  
##  3rd Qu.:632.5                      3rd Qu.:512.5  
##  Max.   :751.0                      Max.   :697.0

Attributes analysis

Number of sets released over years based on inventory sets quantity

Conclusions:

We can observe a rapid increase in the number of sets released after 2000. The graph shows that the release frequency became more volatile, with significant peaks and troughs.

In the most recent years displayed on the graph, there is a notable fluctuation with sharp increases in the number of sets followed by declines. This could be due to various factors such as market strategy, changes driven by COVID-19 pandemic, where we can see a decent increase of sets in 2020 and 2021 years.

Top 10 years with most sets released based on inventory sets quantity

Color variety in sets over the years

Conclusions:

The plot shows a clear upward trend in the average number of unique colors used in Lego sets from around the 1950s to the present. Notably, there is a significant increase starting in the early 2000s, where the average number of unique colors per set rises more steeply compared to previous decades.

This trend could be indicative of Lego’s strategy to make sets more appealing and varied, perhaps in response to market demands for more intricate and visually stimulating products.

Distribution of the number of minifigs in set - 10 sets with most minifigures

## # A tibble: 10 × 4
##        id set_num  num_minifigs name                                            
##     <int> <chr>           <int> <chr>                                           
##  1   2579 9293-1             29 Community Workers                               
##  2   2267 852293-1           28 Fantasy Era Castle Giant Chess Set              
##  3 100622 76178-1            25 Daily Bugle                                     
##  4   5411 1063-1             24 Community Workers                               
##  5   7869 75159-1            23 Death Star                                      
##  6   2154 9349-1             22 Fairytale and Historic Minifigures              
##  7   7649 9348-1             22 Community Minifigures                           
##  8  10402 3425-2             22 Grand Championship Cup                          
##  9  10538 3425-1             22 Grand Championship Cup - U.S. Men's Team Cup Ed…
## 10  85198 71741-1            22 NINJAGO City Gardens

Number of minifigures included per set over time

## Warning: Removed 1 row containing missing values (`geom_line()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).

Conclusions:

For a significant period, specifically from the early years displayed up to around the late 1970s, the average number of minifigures per set remained relatively constant and close to 1.

Starting from the early 1980s, there is noticeable variability, with the average number of minifigures per set fluctuating more significantly. The fluctuations appear to be somewhat cyclical with peaks and troughs.

The trend becomes more pronounced in later years, with the variability increasing, which could be indicative of more diverse set offerings, special editions, or changes in set design philosophy.

Variables corellations

Correlation between size of a set and number of colors in a set

Conclusion:

The heatmap suggests that there is a positive correlation between the size of a Lego set (as measured by the number of parts) and the color diversity within the set (as measured by the number of unique colors). Sets that have a higher number of parts tend also to have a higher number of different colors.

Correlation between the number of parts in a set and the complexity of a set

Complexity of a set can be achieved by approximate the number of unique part categories used in each set.

## `geom_smooth()` using formula = 'y ~ x'

## [1] 0.5391932

Conclusion:

The scatter plot reveals a positive correlation between the number of parts and set complexity. As the number of parts in a set increases, the number of unique part categories tends to increase as well, suggesting that larger sets are generally more complex. This relationship seems to hold strongly for sets with a smaller number of parts, as indicated by the dense cluster of points toward the origin, where the increase in complexity with the number of parts is quite pronounced.

For sets with a very high number of parts (toward the right end of the X-axis), the data points become more spread out, indicating more variability in complexity for these larger sets. It suggests that once a set reaches a certain size, the addition of more parts does not necessarily increase complexity at the same rate. This could be due to the use of repeated parts within these large sets or a design choice to not increase complexity despite a higher part count.